CIPS-SIGHAN Joint Conference on Chinese Language Processing, Beijing, China, August 28-29, 2010
نویسندگان
چکیده
The authors propose that we need somechange for the current technology inChinese word segmentation. We shouldhave separate and different phases in theso-called segmentation. First of all, weneed to limit segmentation only to thesegmentation of Chinese characters in-stead of the so-called Chinese words. Incharacter segmentation, we will extractall the information of each character.Then we start a phase called Chinesemorphological processing (CMP). Thefirst step of CMP is to do a combinationof the separate characters and is then fol-lowed by post-segmentation processing,including all sorts of repetitive structures,Chinese-style abbreviations, recognitionof pseudo-OOVs and their processing,etc. The most part of post-segmentationprocessing may have to be done by somerule-based sub-routines, thus we needchange the current corpus-based meth-odology by merging with rule-basedtechnique.
منابع مشابه
The Second CIPS - SIGHAN Joint Conference on Chinese Language Processing 20 - 21 December 2012 Tianjin University Tianjin , China
متن کامل
CRF-based Experiments for Cross-Domain Chinese Word Segmentation at CIPS-SIGHAN-2010
This paper describes our experiments on the cross-domain Chinese word segmentation task at the first CIPS-SIGHAN Joint Conference on Chinese Language Processing. Our system is based on the Conditional Random Fields (CRFs) model. Considering the particular properties of the out-of-domain data, we propose some novel steps to get some improvements for the special task.
متن کاملChinese Personal Name Disambiguation Based on Person Modeling
This document presents the bakeoff results of Chinese personal name in the First CIPS-SIGHAN Joint Conference on Chinese Language Processing. The authors introduce the frame of person disambiguation system LJPD, which uses a new person model. LJPD was built in short time, and it is not given enough training and adjustment. Evaluation on LJPD shows that the precision is competitive, but the reca...
متن کاملSIR-NERD: A Chinese Named Entity Recognition and Disambiguation System using a Two-Stage Method
This paper presents our SIR-NERD system for the Chinese named entity recognition and disambiguation Task in the CIPS-SIGHAN joint conference on Chinese language processing (CLP2012). Our system uses a two-stage method and some key techniques to deal with the named entity recognition and disambiguation (NERD) task. Experimental results on the test data shows that the proposed system, which incor...
متن کاملWord Segmentation on Chinese Mirco-Blog Data with a Linear-Time Incremental Model
This paper describes the model we designed for the word segmentation bakeoff on Chinese micro-blog data in the 2nd CIPS-SIGHAN joint conference on Chinese language processing. We presented a linear-time incremental model for word segmentation where rich features including character-based features, word-based features as well as other possible features can be easily employed. We report the perfo...
متن کامل